{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9d9187a2",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "We will use the Transformers library from HuggingFace which is pip-installable:\n",
    "\n",
    "pip install transformers\n",
    "\n",
    "You'll also probably want to use PyTorch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a578da5f",
   "metadata": {},
   "source": [
    "## Exercise 1: Tokenization and Exbedding Exploration\n",
    "\n",
    "The aim of this exercise is to visualize how text is broken down into tokens and converted into embeddings. \n",
    "\n",
    "1) Create a short ten word sentence\n",
    "2) Tokenize it using a tokenizer from the Hugging Face model bert-base-uncased\n",
    "3) Decode the tokens back into words\n",
    "4) Use the model's embedding layer to project tokens into vectors\n",
    "5) Visualize the embeddings using PCA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf34ac2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModel\n",
    "import torch"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccde3d1a",
   "metadata": {},
   "source": [
    "## Exercise 2: Build Your Own Scaled Dot-Product Attention\n",
    "\n",
    "This exercise gets you familiar with the attention mechanism from scratch on small data.\n",
    "\n",
    "1) Generate small random matrices for queries, keys, and values\n",
    "2) Implement the scaled dot-product attention:\n",
    "\n",
    "$ Attention(Q, K, V) = softmax \\left( \\frac{QK^T}{\\sqrt{d_k}} \\right) V $\n",
    "\n",
    "3) Visualize the attention weights as a heatmap"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f62cf15",
   "metadata": {},
   "source": [
    "## Exercise 3: Multi-Head Attention \n",
    "\n",
    "This exercise shows how multi-head attention works by implementing a simplified version with synthetic data.\n",
    "\n",
    "Repeat Ex. (2) with a synthetic input of 3 tokens, each with an 8-d embedding and 3 attention heads"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed46c9b1",
   "metadata": {},
   "source": [
    "## Exercise 4: Explore Attention on a Sentence\n",
    "\n",
    "Here we will see how each word in a sentence attends to other in context.\n",
    "\n",
    "1) Input a sentence into the DistilBERT model\n",
    "2) Extract the attention weights from one or more layers\n",
    "3) Use a heat map to visualize attention across words\n",
    "\n",
    "Q. In your sentence, which words focus on others\n",
    "\n",
    "Q. How does this vary between layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fe32ef1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import DistilBertModel, DistilBertTokenizer"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}